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WHAT IS EXPLICIT SYNCHRONIZATION? 


е Fence is an abstract primitive that marks completion of an operation 
© Implicit synchronization 

® Fences are attached to buffers 

e Kernel manages fences automatically based on buffer read/write access 

e Currently used by DRM (dma-buf fences) 
e Explicit synchronization 

® Fences are passed around independently 

е Kernel takes and emits fences to/from user space when submitting work 


@ Currently used on Android (sync fence fd's) 
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ADVANTAGES 


е Improved performance of bindless graphics APIs 
® Better alignment with user space graphics APIs 
® Allow parallel processing of user space suballocations 


© Fits in nicely with explicit buffer handoffs 
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BINDLESS GRAPHICS PERF IMPROVEMENTS 


е Bindless graphics and Compute APIs allow building very large working 
sets that any given command buffer can reference 


е References can be by runtime-generated virtual address rather than slots or 
enums 


е These working sets can be shared across multiple contexts ог 
command queues 


е Implicit sync may force serialization in these cases 
® Locking and updating fences for every active buffer is costly 


e Working set sizes can be thousands of buffers 


INVIDIA. 


ALIGNS WITH USERSPACE GRAPHICS APIS 


® Developers are demanding explicit control of the driver behavior and 
hardware whenever possible 


е Current Generation OpenGL is defined in terms of explicit 
synchronization 


e EGLSync, GLSync 


e “Hidden” ordering dependencies and stalls because of implicit sync 
are at odds with these design philosophies 
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USER SPACE SUBALLOCATION 


® User space drivers and applications use suballocation for 
performance reasons 


® By definition, kernel has no visibility into this process 


® Operations on separate portions of a buffer should be allowed to 
proceed in parallel 


е Even if they reside in one kernel-visible buffer 
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EXPLICIT INTEROP HANDOFFS 


e Modern processors have many specialized engines 
e Video processing 
e 3D/2D graphics 
e CPU cores 


е Each of these may have its own caches, memory compression engines, or 
other specialized memory access quirks 


e When buffers are shared between them, engine-specific state transitions 
may be needed 


е May be costly operations. May be difficult to perform just-in-time. 
е Simplest solution is for user space to request them explicitly 


e Might as well do explicit synchronization in the same code path 
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IMPLICIT SYNC EXAMPLE 


Channel 1 Channel 2 Channel 3 


IMPLICIT SYNC EXAMPLE 


Channel 1 Channel 2 Channel 3 


nouveau pushbuf kick(pushl, chanl); 


for (each buffer in working set) 
acquire ww mutex 

for (each buffer in working set) 
program wait fence cmd 

submit work 

for (each buffer in working set) { 
store fence 
release ww mutex 


nouveau pushbuf kick ( 
struct nouveau pushibur “push, 
struct nouveau ае" elan) 
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IMPLICIT SYNC EXAMPLE 


Channel 1 Channel 2 Channel 3 


nouveau pushbuf kick(pushl, chanl); 


// push2 has no dependencies, but kernel enforces a wait 
nouveau pushbuf kick(push2, chan2); 


nouveau pushbuf kick( 
struct nouveau pushibur “push, 
struct nouveau ае" elan) 
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Channel 1 


Channel 2 


IMPLICIT SYNC EXAMPLE 


Channel 3 


nouveau pushbuf kick(pushl, chanl); 


// push2 has no dependencies, but kernel enforces a wait 
nouveau pushbuf kick(push2, chan2); 


// push2 depends on pushl only, but user space cannot 
// express that to kernel 
nouveau pushbuf kick(push3, chan3); 


nouveau pushbuf kick ( 
struct nouveau pushibur “push, 
struct nouveau ае" elan) 
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EXPLICIT SYNC EXAMPLE 


Channel 1 Channel 2 Channel 3 


Channel 1 


EXPLICIT SYNC EXAMPLE 


Channel 2 Channel 3 


int fencel = =r; 


nouveau pushbuf kick fence(pushl, chanl, -1, 


&fencel); 
// now fencel == 


nouveau pushbuf kick fence  ( 
struct nouveau дэе оо > push, 
struct пойуеамй есу chan, 
int waitFenceFd, 
int *ет кет = 
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Channel 1 


Channel 2 


EXPLICIT SYNC EXAMPLE 


Channel 3 


int fencel 


nouveau pushbuf kick fence(pushl, chanl, -1, &fencel); 
// now fence 


int fence2 


nouveau pushbuf kick fence(push2,; chan2, -1, &fence2); 
// now fence2 


nouveau pushbuf kick fence  ( 


struct nouveau дэе оо > push, 
struct пойуеамй есу chan, 
int waitFenceFd, 
int *ет кет = 


A NVIDIA. 


EXPLICIT SYNC EXAMPLE 


Channel 1 Channel 2 Channel 3 
| int fencel = -1; 

nouveau pushbuf kick fence (pushl, chanl, -1, &fencel); 

// now fencel == 


int тепсете =L; 
nouveau pushbuf kick fence (push2, chan2, -1, &fence2); 
// now fence2 == 


// the last operation depends on @ only 
+— nouveau pushbuf kick fence(push3, chan3, fencel, NULL); 


waiting @ 


nouveau pushbuf kick fence  ( 
struct nouveau дэе оо > push, 
struct пойуеамй есу chan, 
int waitFenceFd, 
int *ет кет = 
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EXPLICIT SYNC EXAMPLE 


Channel 2 Channel 3 


int тепсе ==; 
nouveau pushbuf kick fence(pushl, chanl, -1, 
// now fencel == 


int тепсете =L; 
nouveau pushbuf kick fence(push2, chan2, -1, 
// now fence2 == 


// the last operation depends on @ and e 
int merged = sync merge(fencel, fence2); 


&fencel); 


&fence2) ; 


nouveau pushbuf kick fence(push3, chan3, merged, NULL); 


nouveau pushbuf kick fence  ( 
struct nouveau дэе оо > push, 
struct пойуеамй есу chan, 
int waitFenceFd, 
int *ет кет = 
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RESIDENCY AND PINNING 


» When we need to swap out or unmap a buffer, we need to wait until 
it is no longer accessed by hw 


> This is not the perf-critical case, so we can be conservative in order 
to optimize the critical path. For example, on Nouveau: 


> Store one fence to channel vm at each submit 
> Use that fence when evicting or unmapping buffers 
» No need to lock / update fences to every buffer individually at submit? 


> All this is driver specific logic, not common DRM 
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PATH FROM IMPLICIT SYNC -> EXPLICIT SYNC 


е No need to disrupt existing model 
e If a particular device is happy with implicit sync, it can keep using it 

е Allow kernel and user space drivers that prefer explicit to opt-in: 
е Allow user space to handle intra-driver synchronization explicitly 


е Allow user space to associate synchronization primitives with buffers for 
backwards compatibility with current APIs and drivers 


e Move towards tracking working sets rather than individual buffers for object 
lifetime/work completion/paging purposes 
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THANKS! 


> drivers/staging/android/sync.c 


» [RFC] Explicit synchronization for Nouveau (+ RFC patches) 


> dri-devel@lists.freedesktop.org, nouveau@lists. freedesktop.org 


> Let's discuss more over lunch/dinner! 
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BACKUP 


DEADLOCKS? 


> Circular dependencies can be avoided, if fences are only generated 
in kernel when work is submitted 


> This guarantees that user space cannot ask kernel to wait for a fence whose 
work will be submitted later 


» Deadlocks can be avoided, if additionally all submitted work 
completes in finite time 


> This assumption might fail for implicit fences also 


> Timeout mechanisms 
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EXPLICIT SYNC VS. ANDROID SYNC FD’S 


Could also be a process local handle? 


But should support conversion to and from Android sync fd’s 
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